An overview of {openair}

Published

December 5, 2022

Abstract
Learning R can be daunting for the novice but this does not mean that openair cannot be used in a basic way. When openair was first developed it was designed to be used by non-experts in R. In this document we will cover some basic usage to see how air quality data can be imported easily and some plots produced. Even if you do not want to learn much R, you can still do a lot with the minimum of knowledge!

1 Introduction

An important aspect of R (and Python) is that compared with Excel, new users will find it ‘fussy’ about how it deals with data, and this takes some getting used to. Ultimately though, this is a good thing. Once data are correctly formatted, openair can quickly yield many types of useful analysis — which this document aims to demonstrate.

Note

The most common stumbling block to using openair is getting the data correctly formatted!

Start by loading the openair package.

2 Import one year of data for the London Marylebone roadside site

Importing data in this way also provides hourly estimates of wind speed and wind direction from the CMAQ regional air quality model. These data should be available from part way through 2010 to the present day.

Note

As an aside, it can be very effective to organise air quality access in this way. In the UK, data from 100s of sites can easily be accessed.

It is easy to do this in openair with one short line of code:

london_road <- importAURN(site = "my1", year = 2018)

This is a good example of how R code is run in openair functions. In this case, two key bits of information are required — the site code of interest and the year. That’s it.

3 Quick summary of the data

summaryPlot(london_road)

Figure 1: A summary of the date produced using openair::summaryPlot().

4 Produce a time series of NO2 concentrations

timePlot(london_road, 
         pollutant = "no2", 
         ylab = "no2 (ug/m3)")

Figure 2: A timeseries produced using openair::timePlot().

openair will try to format common pollutant names and units properly.

timePlot(london_road, 
         pollutant = "no2", 
         avg.time = "day")

Figure 3: An openair::timePlot() timeseries, averaged to the nearest day.

5 Plot data in calendar format

Choose some nice colours as well.

calendarPlot(london_road, 
             pollutant = "no2", 
             cols = "viridis")

Figure 4: A calendar heatmap produced using openair::calendarPlot().

Which days had concentrations > 100 µg m-3? … a more complicated example with additional options included.

calendarPlot(london_road, pollutant = "no2", 
             breaks = c(0, 100, 500), 
             labels = c("0 to 100", "> 100"),
              cols = c("turquoise4", "deeppink"))

Figure 5: A calendar heatmap produced using openair::calendarPlot(), this time binned into different air quality domains.

7 Plot a wind rose

windRose(london_road)

Figure 9: A wind rose produced using openair::windRose().

8 Plot a polar plot

polarPlot(london_road, 
          pollutant = "no2", 
          col = "plasma")

Figure 10: A polar plot of NO2 produced using openair::polarPlot().

9 Proportion plot

timeProp(london_road, 
         pollutant = "no2", 
         proportion = "wd", 
         avg.time = "week")

Figure 11: A ‘time proportion’ plot produced using openair::timeProp().

10 The {openair} type option

Being able to look at the dependencies of pollutant concentrations on other factors is immensely useful. It can be very illuminating to see how a pollutant varies by season, hour of the day, day of the week, cloud cover… and other pollutants etc. Being able to consider these dependencies quickly and efficiently greatly helps analysis and also leads to a more question-led approach and interactive analysis.

However, we don’t want to spend ages processing data! Here’s a quick example:

pollutionRose(london_road, 
              type = "season", 
              pollutant = "no2")

Figure 12: A pollution rose, demonstrating the openair type option.

And a brief summary of in-built types:

  • “year” splits data by year
  • “month” splits variables by month of the year
  • “monthyear” splits data by year and month
  • “season” splits variables by season. Note in this case the user can also supply a hemisphere option that can be either “northern” (default) or “southern”, so in Australia / New Zealand you will want hemisphere = "southern".
  • “weekday” splits variables by day of the week
  • “weekend” splits variables by Saturday, Sunday, weekday
  • “daylight” splits variables by night-time/daytime. Note the user must supply a longitude and latitude
  • “dst” splits variables by daylight saving time and non-daylight saving time (see manual for more details)
  • “wd” if wind direction (wd) is available type = "wd" will split the data up into 8 sectors: N, NE, E, SE, S, SW, W, NW.
  • “seasonyear” (or “yearseason”) will split the data into year-season intervals, keeping the months of a season together. For example, December 2010 is considered as part of winter 2011 (with January and February 2011). This makes it easier to consider contiguous seasons. In contrast, type = "season" will just split the data into four seasons regardless of the year.

If a categorical variable is present in a data frame e.g. site then that variables can be used directly e.g. type = "site".

type can also be a numeric variable. In this case the numeric variable is split up into 4 quantiles i.e. four partitions containing equal numbers of points. Note the user can supply the option n.levels to indicate how many quantiles to use.

pollutionRose(london_road, 
              pollutant = "o3", 
              type = "nox", 
              grid.line = 10)

Figure 13: A pollution rose for O3 for different intervals of NOx concentrations (quantiles).

What’s missing to make this more useful? The availability of surface meteorological data massively increases the types of analysis that can be carried out. We can also easily access surface measurements which will probably be more accurate than modelled data. This is something we will come back to.

Also, what about site metadata such as site classification, pollutants measured etc? That is also something that can be easily accessed in openair.